Algorithms for Generalized Topic Modeling

نویسندگان

  • Avrim Blum
  • Nika Haghtalab
چکیده

Recently there has been significant activity in developing algorithms with provable guarantees for topic modeling. In this work we consider a broad generalization of the traditional topic modeling framework, where we no longer assume that words are drawn i.i.d. and instead view a topic as a complex distribution over sequences of paragraphs. Since one could not hope to even represent such a distribution in general (even if paragraphs are given using some natural feature representation), we aim instead to directly learn a predictor that given a new document, accurately predicts its topic mixture, without learning the distributions explicitly. We present several natural conditions under which one can do this from unlabeled data only, and give efficient algorithms to do so, also discussing issues such as noise tolerance and sample complexity. More generally, our model can be viewed as a generalization of the multi-view or co-training setting in machine learning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling the Time Windows Vehicle Routing Problem in Cross-Docking Strategy Using Two Meta-Heuristic Algorithms

   In cross docking strategy, arrived products are immediately classified, sorted and organized with respect to their destination. Among all the problems related to this strategy, the vehicle routing problem (VRP) is very important and of special attention in modern technology. This paper addresses the particular type of VRP, called VRPCDTW, considering a time limitation for each customer/retai...

متن کامل

Heuristic and exact algorithms for Generalized Bin Covering Problem

In this paper, we study the Generalized Bin Covering problem. For this problem an exact algorithm is introduced which can nd optimal solution for small scale instances. To nd a solution near optimal for large scale instances, a heuristic algorithm has been proposed. By computational experiments, the eciency of the heuristic algorithm is assessed.

متن کامل

COVERT Based Algorithms for Solving the Generalized Tardiness Flow Shop Problems

Four heuristic algorithms are developed for solving the generalized version of tardiness flow shop problems. We consider the generalized tardiness flow shop model with minimization of the total tardiness as its performance measure. We modify the concept of cost over time (COVERT) for the generalized version of the flow shop tardiness model and employ this concept for developing four algorithms....

متن کامل

Topic Modeling and Classification of Cyberspace Papers Using Text Mining

The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...

متن کامل

A Set of Algorithms for Solving the Generalized Tardiness Flowshop Problems

This paper considers the problem of scheduling n jobs in the generalized tardiness flow shop problem with m machines. Seven algorithms are developed for finding a schedule with minimum total tardiness of jobs in the generalized flow shop problem. Two simple rules, the shortest processing time (SPT), and the earliest due date (EDD) sequencing rules, are modified and employed as the core of seque...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017